ggplot2 errors- we all get
them!theme_gray or theme_bw (or
theme_minimal)?These are important questions, and I want you to develop
(well-informed) opinions on these matters! 
This includes a reconstruction of Nathan
Yau’s hot dog contest example, as interpreted by Jackie Wirz, ported
into R and ggplot2 by Steven Bedrick for a workshop for the
OHSU Data
Science Institute, and finally adapted by Alison Hill for all you
intrepid Data-Viz-onauts!
First, we load our packages:
library(tidyverse)
library(extrafont)
library(here)
Next, we load some data. You can use the following chunk to load it in from a link:
hot_dogs <- read_csv("http://bit.ly/cs631-hotdog",
col_types = cols(
gender = col_factor(levels = NULL)
))
Or you can save the file at the link to a local CSV file. I did this
and saved my file in a folder called data, then built up
the file path to the CSV using here:
hot_dogs <- read_csv(here::here("static/labs/data", "hot_dog_contest.csv"),
col_types = cols(
gender = col_factor(levels = NULL)
))
Either way you do it, check it out once read in and make sure it looks like this!
glimpse(hot_dogs)
Rows: 49
Columns: 4
$ year <dbl> 2017, 2017, 2016, 2016, 2015, 2015, 2014, 2014, 2013, 2013, …
$ gender <fct> male, female, male, female, male, female, male, female, male…
$ name <chr> "Joey Chestnut", "Miki Sudo", "Joey Chestnut", "Miki Sudo", …
$ num_eaten <dbl> 72.000, 41.000, 70.000, 38.000, 62.000, 38.000, 61.000, 34.0…
hot_dogs
# A tibble: 49 × 4
year gender name num_eaten
<dbl> <fct> <chr> <dbl>
1 2017 male Joey Chestnut 72
2 2017 female Miki Sudo 41
3 2016 male Joey Chestnut 70
4 2016 female Miki Sudo 38
5 2015 male Matthew Stonie 62
6 2015 female Miki Sudo 38
7 2014 male Joey Chestnut 61
8 2014 female Miki Sudo 34
9 2013 male Joey Chestnut 69
10 2013 female Sonya Thomas 36.8
# … with 39 more rows
We’ll be wanting to somehow include information about whether a given
year was before or after the incorporation of the competitive eating
league, so let’s add an indicator field to the data using
mutate(). Also, the data’s a little sketchy pre-1981 and
for our purposes today we’ll be focusing on males only, so let’s do some
filtering too:
hot_dogs <- hot_dogs %>%
mutate(post_ifoce = year >= 1997) %>%
filter(year >= 1981 & gender == 'male')
hot_dogs
# A tibble: 37 × 5
year gender name num_eaten post_ifoce
<dbl> <fct> <chr> <dbl> <lgl>
1 2017 male Joey Chestnut 72 TRUE
2 2016 male Joey Chestnut 70 TRUE
3 2015 male Matthew Stonie 62 TRUE
4 2014 male Joey Chestnut 61 TRUE
5 2013 male Joey Chestnut 69 TRUE
6 2012 male Joey Chestnut 68 TRUE
7 2011 male Joey Chestnut 62 TRUE
8 2010 male Joey Chestnut 54 TRUE
9 2009 male Joey Chestnut 68 TRUE
10 2008 male Joey Chestnut 59 TRUE
# … with 27 more rows
Now let’s try making a first crack at a sketchy plot:
ggplot(hot_dogs, aes(x = year, y = num_eaten)) +
geom_col()
Note that our data is already in “counted” form, so we’re using
geom_col() instead of geom_bar().
ggplot(hot_dogs, aes(x = year, y = num_eaten)) +
geom_col() +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017")
Make 3 versions of the last plot we just made:
In the first, make all the columns outlined in “white”.
In the second, make all the columns outlined in “white” and filled in “navyblue”.
In the third, make all the columns outlined in
“white” and filled in according to whether or not
post_ifoce is TRUE or FALSE (use default colors for
now).
ggplot(hot_dogs, aes(x = year, y = num_eaten)) +
geom_col(colour = "white") +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017")
ggplot(hot_dogs, aes(x = year, y = num_eaten)) +
geom_col(colour = "white", fill = "navyblue") +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017")
ggplot(hot_dogs, aes(x = year, y = num_eaten)) +
geom_col(aes(fill = post_ifoce), colour = "white") +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017")
What if you want to change the legend in the last plot you made? Use google to figure out how to do the following:
Delete the legend title
Make the legend text either “Post-IFOCE” or “Pre-IFOCE”.
ggplot(hot_dogs, aes(x = year, y = num_eaten)) +
geom_col(aes(fill = post_ifoce), colour = "white") +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017") +
scale_fill_discrete(name = "",
labels=c("Pre-IFOCE", "Post-IFOCE"))
Now, let’s change the question a little bit. This looks at the creation of the IFOCE. What about the affiliation of the contestants? We’ll need some different data for this. Through the Magic Of Data Science™, we have dug that information up and put it into an expanded version of our CSV file available at http://bit.ly/cs631-hotdog-affiliated.
Let’s work with this new dataset! Do the following:
Read in the “hot_dog_contest_with_affiliation.csv” data file,
using col_types to read in affiliated and
gender as factors.
Within a mutate, create a new variable called
post_ifoce that is TRUE if year is greater
than or equal to 1997.
Also filter the new data for only years 1981 and
after, and only for male competitors.
hdm_affil <- read_csv("http://bit.ly/cs631-hotdog-affiliated",
col_types = cols(
affiliated = col_factor(levels = NULL),
gender = col_factor(levels = NULL)
)) %>%
mutate(post_ifoce = year >= 1997) %>%
filter(year >= 1981 & gender == "male")
hdm_affil <- read_csv(here::here("static/labs/data", "hot_dog_contest_with_affiliation.csv"),
col_types = cols(
affiliated = col_factor(levels = NULL),
gender = col_factor(levels = NULL)
)) %>%
mutate(post_ifoce = year >= 1997) %>%
filter(year >= 1981 & gender == "male")
glimpse(hdm_affil)
Let’s do some basic EDA with this new dataset! Do the following:
Use dplyr::distinct to figure out how many unique
values there are of affiliated.
Use dplyr::count to count the number of rows for
each unique value of affiliated; use ?count to
figure out how to sort the counts in descending order.
hdm_affil %>%
distinct(affiliated)
# A tibble: 3 × 1
affiliated
<fct>
1 current
2 former
3 not affiliated
hdm_affil %>%
count(affiliated, sort = TRUE)
# A tibble: 3 × 2
affiliated n
<fct> <int>
1 not affiliated 20
2 current 11
3 former 6
Now let’s plot this new data, and fill the columns according to our
new affiliated column.
ggplot(hdm_affil, aes(x = year, y = num_eaten)) +
geom_col(aes(fill = affiliated)) +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017")
Do the following updates to the last plot we just made:
Update the colors using hex colors:
c('#E9602B','#2277A0','#CCB683').
Change the legend title to “IFOCE-affiliation”.
Save this plot object as “affil_plot”.
affil_plot <- ggplot(hdm_affil, aes(x = year, y = num_eaten)) +
geom_col(aes(fill = affiliated)) +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017") +
scale_fill_manual(values = c('#E9602B','#2277A0','#CCB683'),
name = "IFOCE-affiliation")
affil_plot
The spacing’s a little funky down near the origin of the plot. The documentation
tells us that the defaults are c(0.05, 0) for continuous
variables. The first number is multiplicative and the second is
additive.
The default was that 1.8 ((2017-1981)*.05+0) was added to the right
and left sides of the x-axis as padding, so the effective default limits
were c(1979, 2019).
Let’s tighten that up with the expand property for the
scale_y_continuous (we’ll also change the breaks for y-axis
tick marks here) and scale_x_continuous settings:
affil_plot <- affil_plot +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0, 70, 10)) +
scale_x_continuous(expand = c(0, 0))
affil_plot
But now the plot looks like it is wearing tight pants.
Let’s loosen things up a bit by updating the plot coordinates.
Use coord_cartesian to:
Set the x-axis range to 1980-2018
Set the y-axis range to 0-80
Using coord_cartesian is the preferred layer here
because “setting limits on the coordinate system will zoom the plot
(like you’re looking at it with a magnifying glass), and will not change
the underlying data like setting limits on a scale
will.”
limits unless you really know what you are
doing! Most of the time, you want to change the coordinates instead.
affil_plot <- affil_plot +
coord_cartesian(xlim = c(1980, 2018), ylim = c(0, 80))
affil_plot
Let’s change some key theme settings:
affil_plot +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text = element_text(size = 12)) +
theme(panel.background = element_blank()) +
theme(axis.line.x = element_line(color = "gray80", size = 0.5)) +
theme(axis.ticks = element_line(color = "gray80", size = 0.5))
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
By default, plot titles in ggplot2 are left-aligned. For
hjust:
0 == left0.5 == centered1 == rightWe could also save all these as a custom theme. We are not fans of
the default font, so we are also going to change this. To do this, you
need to install the (extrafont package)[https://github.com/wch/extrafont] and follow its setup
instructions before doing this next step.
hot_diggity <- theme(plot.title = element_text(hjust = 0.5),
axis.text = element_text(size = 12),
panel.background = element_blank(),
axis.line.x = element_line(color = "gray80", size = 0.5),
axis.ticks = element_line(color = "gray80", size = 0.5),
text = element_text(family = "Lato") # need extrafont for this
)
affil_plot + hot_diggity
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : no
font could be found for family "Lato"
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found
We could also use someone else’s theme:
library(ggthemes)
affil_plot + theme_fivethirtyeight(base_family = "Lato")
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : no
font could be found for family "Lato"
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : no
font could be found for family "Lato"
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found
affil_plot + theme_tufte(base_family = "Palatino")
The final thing we have to mess with is the x-axis ticks and labels.
We’ll do this in two steps, then override our previous layer
scale_x_continuous.
years_to_label <- seq(from = 1981, to = 2017, by = 4)
years_to_label
[1] 1981 1985 1989 1993 1997 2001 2005 2009 2013 2017
hd_years <- hdm_affil %>%
distinct(year) %>%
mutate(year_lab = ifelse(year %in% years_to_label, year, ""))
affil_plot +
hot_diggity +
scale_x_continuous(expand = c(0, 0),
breaks = hd_years$year,
labels = hd_years$year_lab)
Scale for x is already present.
Adding another scale for x, which will replace the existing scale.
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : no
font could be found for family "Lato"
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : no
font could be found for family "Lato"
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found
Don’t name your files “final” :)
All together in one chunk, here is our final (for now) plot! I’m also adding some additional elements here to show you options:
nathan_plot <- ggplot(hdm_affil, aes(x = year, y = num_eaten)) +
geom_col(aes(fill = affiliated)) +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017") +
scale_fill_manual(values = c('#E9602B','#2277A0','#CCB683'),
name = "IFOCE-affiliation") +
hot_diggity +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0, 70, 10)) +
scale_x_continuous(expand = c(0, 0),
breaks = hd_years$year,
labels = hd_years$year_lab) +
coord_cartesian(xlim = c(1980, 2018), ylim = c(0, 80))
nathan_plot
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : no
font could be found for family "Lato"
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found
Adding some plot annotations rather than having a fill legend:
nathan_ann <- nathan_plot +
guides(fill = FALSE) +
coord_cartesian(xlim = c(1980, 2019), ylim = c(0, 85)) +
annotate('segment', x=1980.75, xend=2000.25, y= 30, yend=30, size=0.5, color="#CCB683") +
annotate('segment', x=1980.75, xend=1980.75, y= 30, yend=28, size=0.5, color="#CCB683") +
annotate('segment', x=2000.25, xend=2000.25, y= 30, yend=28, size=0.5, color="#CCB683") +
annotate('segment', x=1990, xend=1990, y= 33, yend=30, size=0.5, color="#CCB683") +
annotate('text', x=1990, y=36, label="No MLE/IFOCE Affiliation", color="#CCB683", family="Lato", hjust=0.5, size = 3) +
annotate('segment', x=2000.75, xend=2006.25, y= 58, yend=58, size=0.5, color="#2277A0") +
annotate('segment', x=2000.75, xend=2000.75, y= 58, yend=56, size=0.5, color="#2277A0") +
annotate('segment', x=2006.25, xend=2006.25, y= 58, yend=56, size=0.5, color="#2277A0") +
annotate('segment', x=2003.5, xend=2003.5, y= 61, yend=58, size=0.5, color="#2277A0") +
annotate('text', x=2003.5, y=65, label="MLE/IFOCE\nFormer Member", color="#2277A0", family="Lato", hjust=0.5, size = 3) +
annotate('segment', x=2006.75, xend=2017.25, y= 76, yend=76, size=0.5, color="#E9602B") +
annotate('segment', x=2006.75, xend=2006.75, y= 76, yend=74, size=0.5, color="#E9602B") +
annotate('segment', x=2017.25, xend=2017.25, y= 76, yend=74, size=0.5, color="#E9602B") +
annotate('segment', x=2012, xend=2012, y= 79, yend=76, size=0.5, color="#E9602B") +
annotate('text', x=2012, y=82, label="MLE/IFOCE Current Member", color="#E9602B", family="Lato", hjust=0.5, size = 3)
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
Coordinate system already present. Adding new coordinate system, which will
replace the existing one.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
nathan_ann
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): no font could
be found for family "Lato"
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : no
font could be found for family "Lato"
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : no
font could be found for family "Lato"
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found
Finally, adding in another layer of data from female contestants:
hdm_females <- read_csv(here::here("data", "hot_dog_contest_with_affiliation.csv"),
col_types = cols(
affiliated = col_factor(levels = NULL),
gender = col_factor(levels = NULL)
)) %>%
mutate(post_ifoce = year >= 1997) %>%
filter(year >= 1981 & gender == "female")
Error: '/Users/paultesta/git/pols1600_site/data/hot_dog_contest_with_affiliation.csv' does not exist.
glimpse(hdm_females)
Error in glimpse(hdm_females): object 'hdm_females' not found
nathan_w_females <- nathan_ann +
# add in the female data, and manually set a fill color
geom_col(data = hdm_females,
width = 0.75,
fill = "#F68A39")
Error in fortify(data): object 'hdm_females' not found
nathan_w_females
Error in eval(expr, envir, enclos): object 'nathan_w_females' not found
And adding a final caption:
caption <- paste(strwrap("* From 2011 on, separate Men's and Women's prizes have been awarded. All female champions to date have been MLE/IFOCE-affiliated.", 70), collapse="\n")
nathan_w_females +
# now an asterisk to set off the female scores, and a caption
annotate('text', x = 2018.5, y = 39, label="*", family = "Lato", size = 8) +
labs(caption = caption) +
theme(plot.caption = element_text(family = "Lato", size=8, hjust=0, margin=margin(t=15)))
Error in eval(expr, envir, enclos): object 'nathan_w_females' not found